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SUMMARY  OF  RESEARCH 

Research  GoalsT  The  overall  goal  of  the  present  research  Is  to  construct 
a  safe  and  effective  hunan  anthrax  vaccine  using  recombinant  DMA  techniques.  ''  '  .. 

-  .K-'  *■  ‘ 

Ue  plan  to  isolate  and  characterize  the  Bacillus  anthracis  toxin  genes  for 
protective  antigen  (PA),  lethal  factor  (LF)  and  edema  factor  (EF) .  The 
Individual  toxin  genes  will  be  cloned  and  expressed  in  E.  coli  and  B. 
subtilis.  The  toxin  genes  will  be  modified  using  site-specific  mutagenesis 
or  deletion  mutagenesis  procedures  .  to  generate  gene  mutants  which  lack 
biochemical  activity  but  which  are  still  fully  immunologic  for  use  in  a 
recombinant  vaccine.  These  mutant  genes  can  then  be  inserted  back  into  B. 
anthracis  Sterne  with  the  selective  removal  of  wild>C3rpe  genes.  These 
mutant  B.  anthracis  strains  will  be  tested  in  animals,  such  as:  the  mouse  or 
guinea  pig,  for  vacciiie  efficacy. 

Ve  will  also  characterize  the  B.  anthracis  plasmids  pXOl  and  pX02  (!• 

3).  Since  we  plan  to  insert  the  toxin  genes  back  into  B.  anthracis  to 
construct  a  recombinant  vaccina  hos^,  we  need  to  know  a  complete  restriction 
map  of  pXOl,  which  contains  the  toxin  genes.  In  addition,  in  order  to 
understand  the  expression  of  the  toxin  genes  and  of  the  capsule  (2-4),  we 
need  to  physically  characterize  these  plasmids  as  completely  as  possible. 

Research  Achievements .  During  the  course  of  this  contract,  we  have 
isolated  and  characterized  each  of  the  B.  anthracis  toxin  genes.  The  PA 
(.pag)  gene  was  cloned  and  initially  characterized  in  the  Bacteriology 
Division  of  USAMRIID  (5).  In  addition,  the  DNA  sequence  for  pag  was  also 
determined  by  them  (6).  The  cloning  and  characterization  of  the  EF  (cya)  and 
LF  (iaf)  genes  were  performed  in  my  laboratory  (7,8).  The  DNA  sequences 
for  the  cya  (9)  and  lef  (unpublished  data  of  author)  genes  have  also  been 
completed  in  my  laboratory. 
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An  improved  method  for  the  isolation  of  large  quantities  of  pXOl  and 

pX02  from  B.  anthracLs  strains  was  developed  at  Brigham  Young  University 

/ 

(10).  Initial  restriction  enzyme  cleavage  maps  have  also  been  constructed. 
We  have  also  initiated  mutagenesis  procedures  for  the  modification  of  each 
of  the  toxin  genes.  These  mutants  are  currently  being  tested  for  biochemical 
activity.  In  addition,  these  gene  mutants  are  being  inserted  into  B. 
subtilis  to  produce  larger  quantities  of  these  proteins  and  for  vaccine  testing. 
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The  investigators  (Principal  Investigator  and  Graduate  Students)  have 
abided  by  the  National  Institutes  of  Health  Guidelines  for  Research  Involving 
Recombinant  DNA  Molecules  (May,  1986).  Supplemental  guidelines  pertaining 
to  the  subcloning  of  the  Individual  B.  anthracis  toxin  genes  in  sporulatlon 
competent  B.  subcilis  was  approved  by  the  NIH  committee  on  toxins  March 
13 ,  1986 .  All  recombinant  DNA  research  has  also  been  registered  with  and 
approved  by  the  Brigham  Yo\mg  University  Institutional  Biosafety  Committee. 
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RESyLTS 


Isolation  and  characterization  of  the  edema  factor  gene  (era).  The 
edema  factor  Is  a  calmodulin-dependent  adenylate  cyclase  (11,12).  We  have 
cloned  and  sequenced  the  EF  gene  (cya) .  The  DNA  sequence  and  deduced  amino 
acid  sequences  (9)  were  reported  In  previous  annual  reports  and  are  shown 
In  Appendix  I  and  II.  A  paper  describing  the  cloning  and  expression  of  EF 
In  E.  ,coli  has  been  published  (8)  and  a  manuscript  describing  the  DNA 
sequence  and  Its  deduced  as5.no  acid  sequence  has  been  submitted  and  should 
soon  be  accepted  by  Gene  (9) . 

Several  Interesting  structural  features  for  EF  are  part  of  its  deduced 
amltio  acid  sequence.  (1)  EF  apparently  contains  a  33  amino  acid  signal 
peptide  which  conforms  to  known  Bacillus  leader  sequences  In  that  It  starts 
with  charged  (mostly  positive)  and  hydrophilic  residues  (amino  acids  1-10), 
followed  by  a  central  core  of  hydrophobic  amino  acids  (residues  11-23)  and 
then  several  hydrophilic  residues  (amino  acids  24-33)  prior  to  the  start  of 
the  mature  protein.  Proteolytic  cleavage  apparently  occurs  at  an  Ala-Het 
peptide  bond,  near  the  start  of  a  proposed  a-hellx  (see  Figure  4A) ,  consistent 
with  signal  processing  after  an  Ala  or  Gly  in  bacilli  (13) .  PA  apparently 
contains  a  29  amino  acid  leader  sequence  (6)  and  LF  appears  to  cpntaln  a 
33  amino  acid  leader  (see  below).  Figure  4B  shows  a  comparison  between  the 
amino  acid  sequences  near  the  ends  of  the  EF,  PA  and  LF  signal  peptides  and 
the  apparent  position  of  proteolytic  cleavage.  Similar  amino  acids  at  the 
ends  of  these  signal  peptides  may  be  required  for  signal  peptidase  recognition 
or  for  secretion.  (11)  A  very  strong  Bacillus  ribosome  binding  site 
immediately  upstream  from  the  start  of  the  EF  protein  coding  region  is 
present  (AAAGGAGGT)  which  is  similar  to  the  Identical  PA  and  LF  ribosome  binding 
sites  (AAAGGAG) .  (ill)  Amino  acid  residues  347  to  355  of  the  EF-precursor 
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protein  contains  the  sequence  Gly-x-x-x-x-Gly-Lys-Ser  (where  x-any  amino 
acid)  which  is  a  perfect  match  to  a  consensus  sequence  present  in  prokaryotic 
and  eukaryotic  ATP  and  GTP  binding  proteins  (14)..  The  Lys  residue  is  part 
of  the  ATP  binding  site  of  these  proteins  and  appears  to  be  part  of  the  EF 
ATP  binding  site  as  well.  That  is,  using  site-specific  mutagenesis  procedures, 
we  have  replaced  this  Lys  within  EF  with  an  Asn  and  cyclase  activity  was 
reduced  90-95%  (unpublished  data  of  author),  (iv)  Ve  have  also  identified 
a  domain  in  EF  which  could  represent  its  putative  calmodulin-binding  site. 
As  described  in  the  EF  sequencing  paper  (9) ,  calmodulin-binding  proteins 
ofcen  contain  an  a-helical  region  with  charged  or  hydrophilic  residues  on 
one  side  and  hydrophobic  residues  on  the  other.  Such  an  ampihiphilic  helical 
region  is  present  in  EF  located  between  amino  acid  residues  313-323  of  the 
EF-precursor  (see  Appendix  II).  (v)  No  homology  between  the  EF  gene  or 
its  deduced  EF  amino  acid  sequence  was  observed  with  either  the  E.  coll  or 
yeast  adenylate  cyclases.  However,  there  is  at  least  three  regions  of 
homology  in  the  amino  acid  sequence  between  EF  and  the  B.  pertussis  calmodulin- 
dependent  adenylate  cyclase.  The  putative  calmodulin-binding  site,  identified 
above,  is  conserved  in  the  B.  pertussis  adenylate  cyclase  as  well  (15,16). 

As  mentioned  above,  we  have  also  compared  the  EF  amino  acid  sequence 
with  the  calmodulin -dependent  adenylate  cyclase  of  Bordetexls  pertussis , 
the  causative  agent  of  whopping  cough.  The  pertussis  cyclase  appears  to 
function  independently  of  the  pertussis  toxin,  but  is  a  required  virulence 
factor  since  strains  which  lack  cyclase  activity  are  avirulent  (17). 
Glaser  et  al.  (16)  recently  showed  that  the  cyclase  catalytic  domain  is 
about  450  amino  acids  in  length  and  is  part  of  a  larger  precursor  polypeptide 
of  1706  amino  acids.  We  performed  a  homology  search  between  the  entire  EF 
(800  amino  acids)  and  pertussis  cyclase  (1706  amino  acids).  Three  major 
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regions  of  homology  (labeled  #1,  #2  and  #3  in  Appendix  III)  were  observed. 
These  homologous  domains  are  part  of  the  catalytic  domain  of  the  pertussis 
cyclase  and  are  located  within  the  carboxyl  terminal  500  amino  acids  of  EF. 
Doiuain  #1  contains  the  consensus  ATP  binding  site  which  is  surrounded  by 
highly  conserved  amino  acids.  This  high  degree  of  amino  acid  conservation 
indicates  a  close  evolutionary  relatedness  for  these  two  proteins.  The 
putative  calmodulin-binding  aite  is  conserved  for  these  proteins  and  is 
shown  in  Appendix  II  and  III. 

Characterization  of  the  LF  gene  (le£'\.  tfe  have  also  cloned  the  B. 
snthracis  LF  gene  (lef)  and  have  determined  its  entire  DNA  sequence.  Ue 
easily  identified  the  start  of  the  LF  gene  since  the  first  IS  amino  acids 
of  the  mature  LF  was  previously  determined  by  Dr.  J.  Schmidt  (USAMRIID) . 
The  LF  DNA  sequence  and  the  deduced  amino  acid  sequence  are  shown  in  Appendix 
IV.  The  LF  gene  contains  a  good  ribosome  binding  site  (AAAGGAG)  which  is 
identical  to  the  proposed  PA  gene  ribosome  binding  site.  The  LF-precursor 
apparently  contains  a  33  amino  acid  signal  sequence  (see  Figure  4A)  which 
Is  removed  during  secretion.  This  signal  sequence  conforms  to  consensus 
Bacillus  leader  peptides  (and  to  the  EF  and  FA  signal  poptides)  in  that  it 
starts  with  a  polar  or  charged  region  followed  by  23  non-polar,  hydrophobic 
amino  acid  residues.  After  this  33  amino  acid  leader  peptide,  the  next  16 
amino  acids  correspond  exactly  to  the  LF  amino  acid  sequence  determined  by 
Dr.  Jim  Schmidt  (USAtlRIID) ,  except  for  one  amino  acid.  Amino  acid  position 
+10  of  the  mature  protein  (+43  of  LF-precursor)  is  a  His  (based  on  the  DNA 
sequence)  whereas  it  was  previously  reported  to  be  a  Lys  (based  on  LF 
protein  sequencing).  Interestingly,  there  is  a  single  Cys  in  the  LF  leader, 
although  no  Cys  residues  are  in  the  mature  protein.  The  entire  protein 
sequence  of  LF  is  also  shown  in  Appendix  V. 
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Thera  appears  to  be  extensive  aalno  acid  homology  between  LF  and  EF 
in  the  first  300  amino  acids  of  each  protein.  Ue  have  detected  10  closely 
related  domains  and  three  of  these  highly  conserved  dotiains  are  underlined 
(labelled  #1,  #2  and  #3)  in  Appendix  II  and  Appendix  These  homologous 
regions  could  represent  domains  which  are  required  for  association  with  PA 
prior  to  cellular  uptake.  Since  these  conserved  domains  in  LF  and  EF  are 
charged,  Interactions  with  PA  may  occur  through  '  series  of  electrostatic 
interactions . 

Mutagenesis  of  the  anthrax  toxin  genes.  Using  site* specific  mutagenesis 
procedures,  we  have  altered  the  EF  gene  in  order  to  modify  its  enzyme 
activity  and  to  construct  EF  expression  vectors.  First,  the  orevlotuly 
identified  ATP  binding  domain  in  EF,  which  conforms  to  the  consensus  ATP 
binding  site  of  other  prokaryotic  and  eukaryotic  ATP  and  GTP  binding  proteins 
(1&),  has  a  Lys  residue  which  is  involved  in  ATP  binding.  This  amino  acid 
was  changed  to  an  Asn  in  EF.  When  this  mutant  EF  was  isolated  from  E. 
eoli,  adenylate  cyclase  activity  was  reduced  about  90-95%  indicating  that 
this  Lys  is  probably  Involved  in  ATP  binding.  However,  since  total  activity 
was  not  abolished,  other  residues  are  probably  also  involved.  Of  particular 
Interest,  is  the  presence  of  a  His  two  residues  prior  to  this  Lys.  This 
His  is  also  conserved  in  the  B.  pertussis  adenylate  cyclase  (see  the  ATP 
binding  domain  in  Appendix  III) . 

We  have  also  removed  a  Bglll  cleavage  site  within  the  EF  gene  and 
inserted  a  new  Bglll  recognition  site  immediately  prior  to  the  start  of 
the  protein  coding  sequence.  In  another  experiment,  we  inserted  a  Bglll 
cleavage  site  Immediately  downstream  from  the  PA  promoter  so  that  we  could 
fuse  the  PA  promoter  to  the  EF  gene.  This  hybrid  toxin  gene,  when  Inserted 
into  pBS42  (18)  and  ,ans formed  into  B.  subtilis,  expressed  EF  at  a  level 
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AC  least  as  great  as  B.  anthracis  Sterne.  Ve  are  in  Che  process  of  determining 
Che  precise  amount  produced  using  an  ELISA  or  VesCem  blot.  EF  was  secreted 
from  B.  Mubtllis  and  was  enzymatically  active  in  an  adenylate  cyclase 
assay.  Since  PA  expression  is  regulated  by  bicarbonate  (19)  in  B.  anthracLs 
(Dr.  J.  Bartkus,  USAMRIID,  personal  conmninlcaCion) ,  we  are  attempting  to 
transfer  this  PA  promoter-EF  gene  plasmid  into  B.  anthracLs  by  electroporation. 
Hopefully,  this  plasmid,  when  introduced  into  B.  anthracLs,  will  produce 
regulated  high  levels  of  EF  for  purification  and  analysis.  EF  gane  mutants 
can  also  be  generated  and  transferred  to  B.  anthracia  using  this  plasmid 
construction. 

Several  oniCagenesis  experiments  have  also  been  initiated  with  the  PA 
gene.  Since  expression  of  PA  in  B.  antbracia  appears  to  be  significantly 
greater  than  either  LF  or  EF,  we  are  fusing  the  PA  promoter  to  both  the  EF 
and  LF  genes  for  higher  levels  of  expression.  In  addition,  we  have  started 
experiments  to  specifically  alter  PA.  Specifically,  we  are  mutating  the 
Arg'Ly.a-Lys-Arg  sequence  (Or.  S.  Leppla,  USAMRIID,  personal  communication) 
in  PA  which  is  cleaved  with  a  trypsin*  like  enzyme  when  bound  to  its  cellular 
receptor.  After  cleavage,  the  amino  terminal  20,000  daltons  of  PA  is 
removed  and  PA  can  now  bind  either  LF  or  EF.  Therefore,  by  preventing  cleavage, 
LF  or  EF  will  not  bind  and  cannot  enter  the  cell.  Ve  will  alter  the  amino 
acids  at  this  location  in  PA  to  examine  specificity  of  cleavage  and  to 
substitute  amino  acids  which  could  prevent  cleavage.  These  alterations 
should  also  prevent  the  binding  of  LF  or  EF  and  make  these  toxin  components 
essentially  inactive. 

Transcription  start  sites  for  the  anthrax  toxin  ^enes.  We  have  used 


radiolabeled  oligonucleotides,,  specific  for  each  of  the  different  toxin 
genes,  to  determine  the  start  site  for  toxin  gene  transcription.  Using 


mRNA  (isolated  from  3.  snehracls  Sterne)  as  template,  each  oligonucleotide 
was  used  to  prime  DNA  synthesis  (using  reverse  trc^nscriptase)  towards  the 
5' -end  of  the  respective  toxin  mRNA.  This  newly  synthesized  radioactive 
DNA  was  denatured  and  electrophoresed  on  a  denaturing  polyacrylamide  gel. 
Using  this  approach,  we  have  successfully  identified  the  start  sites  for  FA 
and  LF  ge'ie  transcription.  The  PA  promoter  is  apparently  located  immediately 
upstream  from  the  start  of  its  coding  region  with  transcription  starting 
about  25  bases  before  the  first  start  codon  for  PA  translation  (6).  Likewise, 
the  apparent  start  for  LFgene  transcription  occurs  25  bases  prior  to  the 
ATG  start  codon  fer  LF  translation  (about  nucleotide  456  in  Appendix  IV). 
We  have  not  yet  been  able  to  localize  EF  gene  transcription.  This  failure 
is  probably  due  to  the  low  level  of  EF  mRNA  produced  in  B.  anthracis  which 
is  at  least  10- fold  lower  than  either  the  PA  or  LF  mRNA  concentrations 
(unpublished  data  of  author) . 

Expression  of  toxin  genes  in  B.  subtllls  and  B.  anthracis.  In  an 
effort  to  express  the  anthrax  toxin  genes  in  B.  subtills,  we  have  cloned 
each  of  the  toxin  genes  into  B.  subtllls  expression  plasmids'  Initlatlly, 
we  fused  these  genes  to  a  regulated  promoter  and  a  good  ribosome  binding 
site  which  is  present  in  pSI-1  (20).  Using  site-specific  mutagenesis 
procedures,  we  have  introduced  new  Xbal  recognition  sites  immediately  before 
the  start  codons  for  the  PA,  EF  and  LF  genes.  Following  cleavage  with 
Xbal,  each  of  the  toxin  genes  was  ligated  into  plasmid  pSI-1.  When  transformed 
into  B.  subtllls i  transcription  of  the  inserted  toxin  genes  is  regulated  by 
the  lac  repressor  and  IPTG  (18,20).  For  example,  the  amount  of  PA  produced 
by  this  fusion  was  close  the  expression  of  PA  from  PAl  (21). 

We  have  also  constructed  a  plasmid  using  the  T7  promoter  cloned  upstream 
from  the  toxin  gene.  We  cloned  the  T7  RNA  polymerase  gene  (22)  into  pSI-1 


so  chac  transcription  would  be  controlled  by  the  lac  promoter,  which  is 
inducible  with  IPTG.  Part  of  this  recombinant  plasmid  which  contained  the 
T7  polymerase  gene  and  the  erythromycin  resistance  gene  from  pEl.94,  was 
integrated  into  B.  subtills  genomic  DNA  (23,24).  B.  subtilis  with  this 
DNA  should  express  T7  RNA  polymerase  after  the  addition  of  IPTG.  These 
cells  can  then  be  transformed  with  a  replication  competent  plasmid  containing 
one  of  the  B.  anthracls  toxin  genes  (e.g.,  cya,  pag,  or  lef}  cloned  downstream 
from  the  T7  promoter  for  gene  expression.  Although  we  have  not  yet  tested 
these  recomblnsiits  InB.  subtilis ,  plasmids  containing  the  toxin  genes  express 
toxin  in  E.  coil  using  the  T7  pol3fmerase  (21).  B.  subtilis  containing  these 
plasmids  should  produce  high  level,  regulated  expression  of  the  toxin  genes 
in  a  safe  bacterial  host.  Toxin  is  secreted  from  B.  subtilis  and  can  be 
used  for  purification  of  Individual  toxin  components. 

Isolation  and  characterization  of  oXOl  and  pX02.  Va  have  developed  an 
efficient  plasmid  isolation  procedure  to  isolate  pure  supercolled  pXOl  and 
pX02  DNA.  This  procedure  involves  chromatography  using  NACS*37  resins  and 
effectively  separates  small  amounts  of  genomic  DNA  from  plasmid  (10).  Our 
purification  protocol  does  not  use  CsCl  bouyant  density  gradients  since  these 
large  plasmids  are  easily  sheared,  converting  them  from  supercolled  to 
relaxed  or  linear  DNA.  A  typical  yield  of  pXOl  from  a  one  liter  culture  of 
B.  anthracls  was  about  200  pg,  which  is  close  to  the  maximum  amount  of  DNA 
expected  per  liter  of  culture  if  these  plasmids  were  present  as  single  copies 
within  B.  anthracls  cells. 

Using  pure  pXOi  and  pX02,  we  characterized  these  DNAs  using  thermal 
denaturation  and  bouyant  density  procedures.  Using  a  Tj,  analysis,  the 
melting  temperatures  for  pXOl  and  pX02  were  82.5°C  ±  0.3°C  and  82.2®C  ± 
0.3°C,  respectively.  These  values  correspond  to  GC  contents  of  32.2%  for 


11 


pXOl  and  31.5%  for  pX02.  Similar  experiments  using  CsCl  banding  gave 
GC-contents  of  31.1%  for  pXOl  and  31.4%  for  pX02.  These  values  are  close  to 
the  GC%  of  B.  suitbracis  genomic  DNA  which  is  32.2%. 

The  restriction  maps  for  pXOl  and  pX02  have  been  determined  for  several 
enzymes  which  cleave  a  few  times,  such  as  Pstl,  Baadll,  Clal,  Sstl,  Bglll 
and  PvuII  (Figures  1  and  2).  Experiments  to  map  the  more  frequent  cutting 
enzymes,  such  as  EcoRI  and  HLndlll,  are  presently  being  completed.  We  have 
generated  recombinant  DNA  libraries  for  pXOl  and  pX02  in  bacteriophage  X  as 
well  as  in  plasmids  in  order  to  generate  a  complete  map  for  the  most  common 
restriction  enzymes.  A  detailed  restriction  enzyme  map  of  the.  LF  and  PA 
gene  regions  on  pXOl  is  also  shown  in  Figure  3. 

In  a  final  effort  to  generate  a  complete  gene  map  of  pXOl  and  pX02,  we 
are  identifying  the  number  and  location  of  the  different  RNA  transcripts  from 
these  plasmids.  This  project  Involves  the  identification  of  the  different 
promoters  and  the  RNAs  made  from  them.  Basically,  we  are  cleaving  pXOl  and 
pX02  with  an  enzyme  which  cleaves  these  DNAa  many  times,  such  as  Hbol  or  5au3A, 
generating  DN.\  fragments  which  can  ligate  to  BamHI  cleaved  plasmids.  Using 
B.  subtlHs  plasmids  which  have  been  cleaved  with  BaaUl  located  prior  to  a 
promoterless  chloramphenicol  resistance  gene  (25) ,  we  will  Insert  the  pXOl 
or  pX02  DNA  fragments  into  these  promoter  identification  plasmids.  After 
transformation  of  these  recombinant  plasmids  into  B.  subCilLs,  we  will 
identify  bacteria  which  are  now  resistant  to  chloramphenicol.  These  plasmids 
will  contain  a  functional  promoter  (derived  from  pXOl  or  pX02)  driving  the 
transcription  of  the  chloramphenicol  resistance  gene.  .  The  recombinant  DNA 
inserts  prepared  from  these  promoter  expression  plasmids  will  then  be 
mapped  on  pXOl  or  pX02.  The  size  and  direction  of  RNA  transcription  will 
also  be  determined.  This  procedure  is  very  powerful  and  should  allow  us  to 
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identify  and  position  most.  If  not  all,  of  the  functional  promoters  from 
the  B.  anchracis  plasmids,  assvimlng  that  all  these  promoters  will  also 
function  in  B.  subtilia.  However,  with  the  recent  discovery  that  we  can 
transform  B.  antbracis  using  electroporation,  we  will  also  be  able  to 
transfer  these  promoter  plasmids  to  3.  anchracis  for  promoter  Identification 
directly  in  the  parent  organism. 

CQNCLy?IPt^? 

It  appears  from  the  data  described  in  this  report,  that  most  of  the 
experiments  outlined  in  the  original  research  proposal  are  essentially 
completed.  (i)  The  anthrax  toxin  genes  are  each  cloned.  (ii)  Each  of  the 
toxin  genes  have  also  been  sequenced.  Ue  will  be  able  to  study  gene  expression 
and  to  characterize  the  toxin  proteins  better,  (ill)  Ve  can  generate  toxin 
gene  mutants  for  the  construction  of  a  safe  vaccine  and  to  elucidate  the 
biochemical  activities  of  these  proteins.  (Iv)  Ve  have  expressed  the 
anthrax  toxin  genes  in  E.  coll  and  B.  aubcllla  and  have  constructed  expression 
vectors,  especially  for  B.  subtilia  and  B.  antbracis ,  which  should  allow 
for  high  level  expression  of  the  toxin  proteins  for  biochemical  and 
immunological  purposes.  (v)  We  have  determined  homology  between  EF  and  the 
pertussis  calmodulin-dependent  adenylate  cyclase  which  shculd  allow  us  to 
better  charcterize  EF  based  on  conserved  domains.  In  addition  homology 
between  LF  and  EF  should  allow  us  to  examine  the  interaction  between  these 
proteins  and  PA.  (vi)  We  have  not  yet  placed  mutant  toxin  genes  back  into 
B.  anchracis ,  although  the  wild-type  PA  and  EF  genes  have  been  transferred. 
Overall,  our  research  has  allowed  us  to  characterize  the  anthrax  toxin 
genes  and  to  construct  Important  gene  mutants.  This  research  is  absolutely 
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required  for  the  construction  of  a  safe  recombinant  DNA  derived  anthrax 
vaccine. 
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FIGURE  1.  Restriction  nap  of  pXOI.  The  positions  of  the  LF,  PA  and 
EF  genes  are  depicted.  The  sizes  of  DNA  fragments  for  each  enzyme  are  not 
Included  due  to  the  lack  of  space. 
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PA  and  LF  gene  regions  of  pXOI 
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FIGURE  3.  Restriction  nap  of  the  PA  and  LF  gene  regions  on  pXOl. 


(A)  The  signal  peptides  (In  bold)  for  EF,  PA  and  LF  are  shown.  The 
proposed  secondary  structure  most  likely  to  be  assumed  for  the 
first  60  amino  acids  of  each  protein  Is  shown  (op-a-hellx;  ^/9- 
sheet;  t-^-tum;  blank-random  coll).  The  amino  terminal  amino 
acid,  as  determined  by  Dr.  J.  Schmidt  (USAMRIID) ,  for  each  mature 
toxin  protein  Is  also  shown. 


EF  signal  peptide 

4 'Start  of  mature  EF 

1  HTRNKFIPNKFSIISFSVLL  FAISSSQAIEVNAMNEHYTE  SDIKRNHKTEKNKTEKEKFK  60 

oattt  aaa  aaaaaaaaaaaaaa  aaaaaaaautaaaaaouiuxaaa 

PA  signal  peptide 

4 -start  of  mature  PA 

1  MKXSKVLZPLMALSTILVSS  TGNLEVIQAEVKQENRLLNE  SESSSQGLLGYYFSDLNFQA  60 

aaaaaafipppaaappfipfi  0000000000000  aaa 


LF  signal  peptide 

4- start  of  mature  LF 

MNIKKEFIKVISMSCLVTAl  TLSGPVFIPLVQGAGGHGDV  GMHVKEKEKNKDENKRKDEE 
aaeu»aaaaaaactat0000P0  fit  tfifififififi  a  octoaaQaoaaaaaaaaaaaa 


(B)  The  amino  acid  sequence  at  the  end  of  the  anthrax  toxin  signal 
peptides  Is  shown.  Cleavage  occurs  after  Ala  or  Gly,  consistent 
with  known  cleavages  after  bacilli  signal  peptides  (14) .  Similar 
amino  acids  at  the  end  of  the  signal  peptides  (denoted  with  a 
vertical  bar  [ | ] )  probably  represents  signal  peptidase  recognition 
sequences.  The  numbers  (-1  or  +1)  Indicate  the  last  amino  acid 
of  the  signal  peptide  and  the  first  amino  acid  of  the  mature 
toxin  protein,  respectively. 


-1  +l 

EF  signal  peptide  61u-Val-A8n-Ala--Met 

III 

PA  signal  peptide  Val-Ile-Gln-Ala--Glu 

till 

LF  signal  peptide  Leu-Val-Gln-Gly--Ala 


Anthrax  toxin  signal  peptides. 
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APPENDIX  I:  Nucleotide  Sequence  of  tha  EF  gene. 

10  20  30  40  50  60  70  80  90 

TTACTTTTTIMXDVCTCAAmAAAAGTCCAAGCACTIATATCXriAATAGATGCOTrCT^^ 

100  110  120  130  I  140  150  160  170  180 

(iVUIAKTIAIGACAOGTnXXXXrEAAaTCCTGAATTCAAAATCOQACTTAGAAAIACAC^^ 

190  200  210  220  230  240  250  260  270 

CT(nAaXniTITmCTAAAIAAACGAAATCAGTCIAAAAATCAACAGCTGAACTrTAICAACTIAGM 

280  290  300  310  320  330  340  350  360 

(X:CTACXnX71Tm^.tn7W^IGT^IX^AIT^CTAAAIAIA^ITAAATATGAATICTL^GCTGTOI^^ 

-35  (putetive  prciaoter  site)  -10 

370  380  390  400  410  420  430  440  450 

GATIAXATriCnAAAIAAAATTGlTW^TmAiCAIGTAGAAlAAAGAGATITnACTmiATlTU^^ 


460  470  480  490  500  510  520  530  540 

ATCriX^TITCTAAATIAGTmAAAIAAAAAAGAAGGATITGCnXviGACrreAGATC^^ 


ribosome  binding  site 

+1  550  560  570  580  590  600  610  620  630 

AGAATCAClAGAAATAMTTIATACCTAAIAAGmAGTATTAIAlCCrmCA(n:AmClAriTGClATAT^^ 

HetHiiArgAsnLysPhelleProAsnLysPheSerllelleSerPheSerValTjBuTiBuPheAlalleSerSerSarGlnAlalle 
33  asilno  acid  leader  sequence 


ie(|uence 


640  650  660  670  I  680  690  700  710  720 

GAAGTAAATGClnAIGAATGAACATIACACrcAGAiOICAtATIAAAACIAAACCAIAAAACTGAAAAAAATAAAACT^ 


1st  amino  acid  of  EF  I 

730  740  750  760  j  770  780  790  800  810 

AAAGACACTIATlAAIAACTIACrriAAAACAGAATITACCAAIOAAACTITAGAIAAAAXACAGCACACACM^ 
LysAspSerlleAsnAsnLeuValLysThrGluPheThrAsnGluthrLeuAspLysIleGlrClnlhrGlnAspLeuLeuLysLysIle 

820  830  840  850  860  870  880  890  900 

CCTAACGATCnACITGAAATITAIAGTGAATIAGCAOGAGAAATCnrATnTACACATATAGATmGr^^ 
ProLysAspValLeuCluIleiyrSerGluLeuGlyGlyGlullellyrEheThrAspIleAspLeuValGluHlsLysGluLeuGlnAsp 


910  920  930  940  |  950  960  970  980  990 

TTAAGTGAAGAACAGAAAAAlAGrAIGAAIAGTAGAOGlGAAAAAGTrCCGTITGCATCCU/miGlATrrGAAAAGAAAAOGGAAACA 


LeuSerGluCluGluLysAsnSerttetAsnSerArgGlyGluLysValProFheAlaSerArgPheValPheGluLysLysArgGluThr 

1000  1010  1020  1030  {1040  1050  1060  1070  1080 

CGTAMTTMmTAAAIAI(AAAGAmTCX:MTTAATAGrrGAAdAAAGTAAAGAA(7IAXAmTGAAATr0GAAAG0GGAlT^ 
ProLysLeuI  lelleAsnlleLysAspiyrAlal  leAsnSerGluSlnSerLysGluValTyriyrGluIleGlyLysGlyl  leSerLeu 

1090  1100  1110  1120  1130  1140  1150  1160  ,  1170 

GATAmiAAGIAAGGATAAATCTCT/USATCCAGAGTriTIAAATnAATlAAGACnTIAAGCXATGATAGTGAm 
AspIlelleSerLysAspLysSerLeuAspProGluI^eLeuAsnlieuIleLysSerLeuSerAspAspSerAspSerSerAspLeuLeu 
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1180  1190  1200  1210  1220  1230  1240  1250  1260 

TITACTCAAAMTmAAGAGAAGaiAGAATTGAATMTAAAMnATACAXmMTITr^^ 

IlieSerGlnLysRieLysGluLysLeuSluLeviAsnAsnLysSerlleAspIleAsnFhelleLysGluAsnLeulhxGluFheGlnHls 

1270  1280  1290  1300  1310  1320  1330  1340  1350 

(XX7m‘ltVl'iAax;iTnt7riVOTATriT(X:ACCTCAa:Al!AGMOGGTAriAGA^^ 

AlallieSerLeuAlafheSerTyrtyrFheAlaProAspHisArgflhrValLeuGluLeuTyrAlaFrQAspMe  tFheGlt/iyxMetAsn 

1360  1370  1380  1390  1400  1410  1420  1430  1440 

AAGTIAGAAAAAGGOCXlATnGAGAAMlVUUSTGAAAGTnGAAGAAAa^A^^ 

lysLsuGluLysGlyGlyllieGluLysIleSerGluSerLeuLysLysCluClyValGluLysAspArglleAspVallieuLysGlyGlu 

1450  1460  1470  1480  1450  1500  1510  1520  1530 

AAAGCAm^AAAOCrrCAOGTmGTACCAGAACATGCASAIGCnTmAAAAAATIGCIV^G^^ 

LysAlaLeuLysAlaSerGlyLeuValProGluHlsAlaAspAlaFneLysLysIleAlaArgGluLeviAsnThxTyrlleLeuFheArg 

1540  1550  1560  1570  1580  1590  1600  1610  1620 

CXrrGTTAAIAAGTlMXnVVCAAACCmTTAAAAGTGGTGTOGCXACAMGGGATIGAATGMACAIO^ 

ProValAsnLysLeuAla'lhrAsnLauIleL/sSerGlyValAlaThrLysGlyLeuAsnGluHisGlyLysSeiSerAspTrpGlyPro 

1630  1640  1650  1660  1670  1680  1690  1700  1710 

GDtfXTOGAIACATACCATTTGATCAAGAriTATCTAAGAAGCATOnXVW^CMTrAGClCTCr;^^ 

ValAlaGlyTyrIleProRleAspGlnAspLauSerLysLysMis01yGl^GlnLeuAlaValGluLysGlyAsnLe^i31uAsnLysLys 
1720  1730  1740  1750  1760  1770  1780  1790  1800 

tcmtxacm3agcaigaaggtgaaai7uxtaamtaccattamgti:aga(x;atitaagaaia^ 

SetlleThxGluHlsGluGlyGluIleGlyLysIleProLeuLysLeuAspHlsLauArglleGluGluLeuLysOlviArrGlyllelle 

1810  1820  1830  1840  1850  1860  1870  1880  1890 

inXMAE»GXAAAAAAGMAXlGAIAATGGXAAAA.VkI^TDm7Gr]7U:MATCX;MTMrCA0CnATATGMTr]^^ 
LauLysGlyLysLysGluIlaAspAsriClyLysLys'iyzTyrLetiLeuSltjSerAsiiAsnGlnValTyrGluPheArglleSerAspGlu 

1900  1910  1920  1930  1940  1950  1960  1970  1980 

MCAACGAAGTACAAIACAACACAAAACAAOGtAAAAmcnTrrmOGGGAAAAATrCMATIOGAGAAAIAIMGAAGIGATGGCl^ 
AsnAsnGluValGlniyrLysThrLysGluGlyLysIleThrValLeuGlyGluLysPheAsriTrpArgAsnlleGluValMetAlaLys 

1990  2000  2010  2020  2030  2040  2050  2060  2070 

MTGTAGAAGGGGTCnGAAGCOGTEM(:AGCrGACTATGAlTIAITnx:wnTGCaX»AGTriV^CAGAAA3^^ 
AsnValGluGlyValLeuLysProLevflhrAlaAspTyrAspLeuPheAlaLeuAlaProSerLeu'IhrGluIleLysLysGlnllePro 

2080  2090  2100  2110  2120  2130  2140  2150  2160 

Ai2AAAAAGAATGGyVTAAA£n7\GTIMCACCCCAAATIXXITAGAAAAGC;AAAMGCriGTDW7DWmTA3TGIATTAMm 
ThrLysArgMetAspLysValValAsrirhrProAsnSerLeudiaLysGlnLysGlyValThrAsnLeuLeuIleLys'tyrGlylleGlu 

2170  2180  2190  2200  2210  2220  2230  2240  2250 

AfiG»MCCGGArrcMCTAAGGGAACTmTCAAATIlXX::AAAAACAMTCXnTGATCX7rTrtilATCiW«X:AGT 
ArgLysProAspSerThrLysGlyThrLeuSerAsnTrpGlnLysGlnMetLeuAspArgLeuAsnGluAlaValLysTyrThrGlyTyr 

2260  2270  2280  2290  2300  2310  2320  2330  2340 

ACWXXXXXXATGTGGTTMCCATGGCACV^GAGCMGATAAlGAAGAGTrTCCTGAAAMGATAAa^MTnTTATAAa^ 
ThrGlyGlyAspValValAsnHisGlyThrGlxJGlnAspAsnGluGluRieProGluLysAspAsnGluIleWiellelleAsnProGlu 
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K{> 


2350  2360  2370  23E0  2350  2400  2410  2420  2430 

GCriCAATmTATIV^CDUW^AATIGOGAGAICACAOGXAGATmiAGAAAAAAACATr^^ 

ClyGlunieIleIjeu'IhrLysAsnTrpGluHetlhr01yArgRieIle(n.uLy8AsnIleThzGlyLysAspTyrLeuTyx3:yrI1ieAsn 

2440  2450  2460  2470  2480  2490  2500  2510  2520 

(XrrrarTATAATAAfiArPJSaXXJlXXrjJ^jaAfJJXTZATATrGf^^ 

ArgSerTyrAsnLysIleAlaProGlyAsnLysAlaiyrlleGluTrpThrAspProIlelhrLysAlaLysIleAsnThrlleProThr 

2530  2540  2550  2560  2570  2580  2590  2600  2610 

TCAGCAiGUVGTITAIAAAAAACmTCC4(;iATCA(:»VACAT(TICAAAIC^^ 

SeiAlaGluFhelleLysAsnLeuSerSerlleArgArgSerSerAsnValGlyValTyrLysAspSeiOlyAspLysAspGluIlieAla 

2620  2630  2640  2650  2660  2670  2680  2690  2700 

MAAAAGAAAGCGIGAAAAAMTIXX:AGCAT4TnGTCAGACT4TIACMTltAGCAAATCATATriTrXCX^ 
LysLysGluSerValLysLysIleAlaGlyiyrLeuSacAspTyiTyrAsnSerAlaAsnHialleRieSacGlrGluLysLysAr^ys 

2710  ,  2720  2730  2740  2750  2760  2770  2780  2790 

AIATCAAIAXrrCGIGGAATCCAAOCQTCCAATGAAATIGAAMlXTITClT^AMTCr^^ 

IleSerlleFheArgGlylleGlnAlaTyrAsrCluIleGluAsnValLeuLysSarLysGlnlleAlaProGltOyrLysAsn'iyrFhe 

2800  2810  2820  2830  2840  2850  2860  2870  2880 

CAATArrowWXSAAAGGAmCCMTCMGTrcMTrcCTrCIMCACATCAAAAATCTAATATT^ 
GlniiyrLauLysGluArglleThrAsrGlnValGlnLeuLeuLei^rhrHisGlnLysSerAsnlleGluFheLysLeuLevOCyrLysGln 

2890  2900  2910  2520  2930  2940  2950  2960  2970 

TXMACrmCAGAAAAIGAAAOX^ATAAlTnCUVGGTCTIXXiAAAAAAmTIGAlCU^AAAAIAAAIAl^^ 
LeuAsnKielhrGluAanGluIhrAspAsnHieGluVall^ieOlnLysIlelleAspGluLys 

2980  2990  3000  3010  3020  3030  3040  3050  3060 

AITCAICATrmAAGAAGACACIAOGAAmMTaGATGTATrGMIAGTIAIACTMlOGrrCTTCrATOGAC^^ 
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APPENDIX  II.  EF  amino  acid  sequence 


(33  aa  signal  peptide)  i-Start  of  mature  EF  (767  aa) 

1  MTRNKFIPNKFSIISFSVlXFAISSSQAIEVNAilNEHYrESDIKRNHKTEKNKTEKEKFKDSINNLVKTE 

71  FTNETLDKIQQTQDLLKKIPKDVLEIYSELGGEIYFTDIDLVEHKELQDLSEEEKNSMNSRGEKVPFASR 

141  fvfekkretpkliinikdyainseoskevWeigkgisldiiskdksldpeflnlikslsddsdssdllf 

#1  #2 


211  SOKFKEKLELNNKSIDINFIKENLTEPOHAFSIAFSYYFAPDHRTVLELYAPDMF 

#3 


LEKGGFEKIS 


281  ESLKKEGVEKDRI DVLKGEKALKASGLVPEHADAFXXIARELNTYILFRPVNKLATNLIKSGVATKGLNE 

(Potential  calmodulin  binding  site) 

351  HGKSSDWGPVAGYIPFDQDLSKKHGQQLAVEKGNLENKKSITEHEGEIGKIPLKLDHLRIEELKENGIIL 
(Putative  ATP  binding  site) 


421  KGKKEIDNGKKYYLLESNNQVYEFRISDENNEVQYKTKEGKITVLGEKFNWRNIEVMAKNVEGVLKPLTA 
491  DYDLFALAPSLTEIKKQIPTKRMDKWNTPNSLEKQKGVTNLLIKYGIERKPDSTKGTLSNWQKQMLDRL 
561  NEAVKYTGYTGGDWNHGTEQDNEEFPEKDNEIFIINPEGEFILTKNWEMTGRFIEKNITGKDYLYYFNR 
631  SYNKIAPGNKAYIEWTDPITKAKINTIPTSAEFIKNLSSIRRSSNVGVYKDSGDKDEFAKKESVKKIAGY 
701  LSDYYNSANHIFSQEKKRKISIFRGIQAYNEIENVLKSKQIAPEYKNYPQYIXERITNQVQLLLTHQKSN 
771  lEFKLLYKQLNFTENETDNFEVPQKIIDEK 


The  sequence  contains  800  amino  acids  (M^  92,464): 


Ala  (A) 

32 

Leu 

(L) 

69 

Arg  (R) 

22 

Lys 

(K) 

103 

Asn  (N) 

61 

Met 

(M) 

9 

Asp  (D) 

44 

Phe 

(F) 

40 

Cys  (C) 

0 

Pro 

(P) 

23 

Gin  (Q) 

27 

Ser 

(S) 

55 

Glu  (E) 

82 

Thr 

(T) 

39 

Gly  (G) 

40 

Trp 

(W) 

5 

His  (H) 

13 

Tyr 

(Y) 

34 

He  (I) 

68 

Val 

(V) 

34 

Acidic 

(Asp  f  Glu) 

126 

Basic 

(Arg  +  Lys) 

125 

Aromatic 

(Phe  +  Trp  +  Tyr) 

79 

Hydrophobic 

(Aromatic  +  He  + 

Lou  + 

Met 

+  Val) 

259 

APPENDIX  III.  Homology  Comparison  between  EF  and  pertussis  cyclase. 


Calmodulin  Site  A37  binding  Site 

. . .  *  *** 

289  EmiIIVLKCHCAlKASGLVPE»d:AIiaaAREIMmFRFVMaA)SajXSGVAIBIU«HX^^ 

I  :  II:  II  I  :|:|  |:  ::|l  II  :|  II  lllllll  I  llllll  Mill  :  :|||  I 

1  ^qq6H:)AGU^^»W»ESGIFAAVlJX3KAVAXEKNmlffRU»aifiTSUAEX^^ 

- - Domain  #1 - — » 


379  AVEKGNLEmCSITEKEmCiaFL  K  lJ»IiaEEIJ<ENCini<l3Q0E3IM9<Xym£SM17^^ 

I  II  :  I  !  II  II  I  I  I  I  I  I  I  I  II  III:  I  I 

91  EVIARA£NMtSSIAH3{IAVIXII£X£RLIMR>^  FRVKE  TSDGmVQaOK  G 

«~Domaln  #2--* 

466  CEXFNUOnEmAENVEIMia?^^  SIIEIKIQIFIXEMCKV  VNT  RSLEiqXVINIII  KRUER  KFDST 

|:  I  :  I  |:  I  llltl  |:||::|  I  Mill:  I  :  I  I  I 

168  COX'  EAVKV  ICNAAG  IFmDIIMFAlMM2^Qla£ARSS^^S(XSV^MAR^^R^^ 

* - Domain  #3 - r — ► 

546  KJi  USINE  AVKYIGmC  D0VI«;m}n«£EPEXnCU^  FIUKNUEMnSFlEKNrr 

II  I  I  I  I  I:  I  ll|:|||||:|  III  |:  l|::  II  :||:  :|  : 

253  VGIEARIqEEaX3mG^^^DFELEVR^WH«AHA\C^^  PFPEM»aFWSA3l5S(JlUI«J  IXEnOQQ  R 


621  (9axainst«suiaAK3i»Yi£t>m«  irxAiONriPiSAEFixNi^siKtssNVGvuasGEiajEiMaaEswia/^^ 

I  |::|  l|:l  I  I  I  :  :||  I  I  III  I  t  I  :|l  I 

339  G8SWFaiIMl!GVAaCSlHn3I»MPGVFS(SSKF5PIXaEIVIMSRaPRFSIGAVE^^  MAA 


709  NHOSqpCKHKISIPHGiqAINEIENVlKSKJIAPEaiNmWXEWlHC^iq^UHJCSNI^^ 

:  I  I  :  I  I  I  I  I  :|  I  |:  I  I  II 

426  VEAAEUEKIRT/lHMiABqDQf^  FGV  SGASAWGqpAU) 


1.  Domains  #1,  #2  and  #3  represent  three  highly  conserved  amino  acid 
domains  In  EF  (top  line  of  each  pair)  and  the  pertussis  cyclase  (bottom 
line  In  each  pair) . 

'2.  The  numbers  to  the  left  of  each  line  indicates  the  amino  acid  position 
for  EF-precursor  or  the  pertussis  cyclase. 

3.  The  asterisks  (*)  Indicate  the  consensus  sequences  for  the  ATP  binding 
site  for  EF  and  the  pertussis  cyclase. 
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APPENDIX  IV:  Nucleotide  Sequence  of  the  LF  gene. 


10  20  30  AO  50  60  70  80  90 

AAATTAGGATiraxnTATGTmCTATITrmAAAATAATAGTATrAAATAGrnXAATGCAAATGATAAATGGCXnni:^ 

100  110  120  130  lAO  150  160  170  180 

AATGAAATAATCTACAAATGGAATITCTCCAGTmAGAriAAACCATACCAAAAAAATGA(:A(nXnX>VAGAAAAATCAlAC3AATCCC^ 

190  200  210  220  230  2A0  250  260  270 

CACTAATrAA(V^TAACQU\ATlXXnAGriTATAOC;DV3AAACrTATmTntnATAATACCATa:AAAAAAG^^ 

280  290  300  310  320  330  3A0  350  360 

CTATrm£nAAAmTTrAGCAAGrAAATITia7Kn:ATAAACAAAGTmTCrmTATAAAAAAm(TriA(rnTI^^ 

370  380  390  AOO  AlO  A20  430  AAO  A50 

AAATCAAAAATrrTmTGAQ^AGAAATArrGCCTITAATnAlXMOGAAATAAiGriAAAATTTTCTACATACTmTrtTAT^^ 


A60  A70  A80  A90  500  510  520  530  5A0 

TgnCACmTAAAAAACCAGAGATTAAATATGAATATAAAAAAAGAATmTAAAACTAATTAglATGTCATglTrACrAACAGCAATr 
(r . b . s . )  MstAenlleLysLysGluPhelleLysVallleSerMetSerCysLeuValThrAlalle 

(33  amino  acid  signal  peptide) 


550  560  570  580  590  600  610  620  630 

f£TrtGfieTOGTaXXriXTnTATCCCCCTT(ItACMCXXXXOCXXXXnX^^ 
ThrLeuSerGlyProValPhelleProLeuValGlnCl: 

-|>1  of  nature  LF 


6A0  650  660  670  680  690  700  7IO  720 

AAAGATCACAATAACAGAAAAGAICMACMCX^AAAIAAAACACAa^AAGAaAITIAAAaUkAATCATGAAACAC^ 
LysAspGluAsnLysArgLysAspGluCluArgAsnLysIhrGlrCluGluHi/LBuLysGluIleHetLysHisIleValLyalleGlu 

730  7A0  750  760  770  780  790  800  810 

GTAAAAOGOGAGGAACCIXnTAAAAAACAG(X>(XACAAAAGCIACTItAGAAA(nAC(>TTnX»^Tt^^ 
ValLysGlyGluGluAlaValLysLysGluAlaAlaGluLysLeuLeuCluLysValProSerAspValLeuGluHetTyrLysAlalle 

820  830  8A0  850  860  870  880  890  900 

OGAGGAAACATATAIATIXmX»VTtXritATATlACAAAACATAIATaTl»GAAa:AmTCTGAAGAIAAGA 
ClyClyLyalleiyrlleValAspOlyAspIleThrLyaHialleSerLeuCluAlaLeuSerGluAspLysLysLysIleLyaAspIla 

910  920  930  9A0  950  960  970  980  990 

TATGOGAAAGATGCTTTAmCATGAACAmTGTATATGCAAAACAAaATATa^ACCCCTACnTCTAATCCAATCTr^^ 
TyrGlyLysAspAlaLeuLeuHisGluHisiyrValTyrAlaLysGluGlyTytOluProValLeuVallleGlnSerSerGltiAspTyr 

1000  1010  1020  1030  iOAO  1050  1060  1070  1080 

(H'AGAAAATACTGAAAAGGCACTCAAanTrATrATCAAATAOCnAAGATAmTCAAGGGATATTOAAGTAAAATrAATCAACCATAT 
ValGluAsnThrGluLysAlaLeuAsnVal'iyrTyrGluIleGlyLysIleLeuSerArgAspIleLeuSerLysIleAsnGlnProTyr 

1090  1100  1110  1120  1130  llAO  1150  1160  1170 

CAGAAATnTTAGATGTATrAAATACCATTAAAAATGCATCTGATrCAGATGGAtiAAGATCTrTTATmCTAATCAGCr^^ 
GlnLysFheLeuAspValLeuAsnlhrlleLysAsnAlaSerAspSerAspGlyGlnAspLeuLeuItieThrAsrGlnLeuLysGluHls 
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1180  1190  1200  1210  1220  1230  1240  1250  1260 

<XX:ACAGACTnTCTCTAGAATrCTTGGMCAAAATA(X:AAIGAOGTA(:yUU3AAGri^^ 

ProlhrAspFheSerValGliiFheLeuGluGlnAsnSeEAsn61uValGlnCluValFhieAlaI.ysAlaFheAlaTyrXyrIleGluPro 

1270  1280  1290  1300  1310  1320  1330  1340  1350 

CAGCATOGTGA3X?ITmCAGCnTlATCCACXXXyw\GCTmMTI7«MXXATAAATTmOGMCAAGAAATAMTCTATCCnXX^ 
GlnHlsArgAspValLeuGlnLeuTyrAlaProGluAlaFheAsnTyrttetAspLysPheAsnGluGlnGluIleAsnLeuSerLeuGlu 

1360  1370  1380  1390  1400  1410  1420  1430  1440 

GAACTIAAAGATCAACGGAlGCTGTCAAGATATCAAAAATOGGAAAACAlAAMCiUXl^CXATCAACAC^^ 
GluLeuLysAspGlnAri^tLeuSetArgTyrClviLysTrpGltiLysIleLysGlnHisiyiGliiHlsTrpSezAspSerLeuSerGlu 

1450  1460  1470  1480  1490  1500  1510  1520  1530 

GMACGAAGAGGA(nTnAAAAAAGCTGCAGATlXX:rAriGAG0CA4AGAAAGATGACATAATICATT^^ 
GluGlyArgGlyLeijLeuLysLysLetjGlnlleProIleGltiProLysLysAspAspIlelleHisSerLeuSeiGlnGluGltiLysOlu 

1540  1550  1560  1570  1580  1590  1600  1610  1620 

CTrCTAAAAAGAAIACAAATIGAIAGTAGTCATriTTlMXniACTGAGCAAAAAGAGr^^ 

LeuLeuLysArglleGlnlleAspSerSerAspFtieLeuSerThxGluGltjLysGluFheLetiLysLysLeuGlnlleAaplleArgAsp 

1630  1640  1650  1660  1670  1680  1690  1700  1710 

TXnTIATCTGMGAAGAAAAAGAGCTIT17WW^TAGAATACAGGria3ATAGTAGTAATCCnTrAT(TCAAAMGAAAMGAGTrm 
SerLeuSaiGluGluGluLysGluLeuLeuAsnArglleGlnValAspSarSerAsnProLeviSerGluLysOluLysGluFheLeuLys 

1720  1730  1740  1750  1760  1770  1780  1790  1800 

AAGCTGAAACTIGATATrCAACCATATCATATEAATCAAAOGTItXaUVGATACAGGAGOGTEAATrcAmGrrCCGrrCAATTAATC]^ 
LysLauLysLeuAspIleGlnProTyrAspIleAsnGlnArglieuGlnAspIhrGlyGlyLeuIleAspSerProSerlleAsnLeuAsp 

1810  1820  1830  1840  1850  1860  1870  1880  1890 

GTAACAAAOCACnATAAAAOOGAIATICAAAAIATIGATGCTnAriACATCM^IYXATroGAACnmTItTlAC^^ 
ValArgLysGlniyrLysArgAspIleGlnAsnlleAspAlaLatjLeuHlsGlnSarlleGlySexlhrLeuTyxAsnLyslleTyTLeu 

1900  1910  1920  1930  1940  1950  1960  1970  1980 

TATGAAAATATGAATATCMTAACCTIAQWXMCCCTADCmXXXaaTrACTrGATKmrrGAEAATACTA^ 
TyrGluAsrKetAsnl  leAsnAanLetilhrAlaThrLBuClyAlaAspLauValAspSerThrAspAsriLnirLysIleAsnArgGlyl  le 

1990  2000  2010  2020  2030  2040  2050  2060  2070 

TTCAAreAAmAAAAAAAATriCAAAIAIAGIATTTCTAGriT^AaAlAIGATrcrrGMATAAATCAAAOGCXTO 
RieAsnGluFheLysLysAsnFheLyaiyrSarllaSarSarAsTtiyrMBtlleValAspIleAsnGluArgFroAlaLeuAspAsnGlu 

2080  2090  2100  2110  2120  2130  2140  2150  2160 

amTGAAATa;;AGAATCCAATTATCACCAGATACra;AG<>OGAIATmGAAAATOGAAAOCTrATATlACAAAGAAACATCGGTCTG 
ArgLeuLysTrpArglleGlnLeuSarProAapThrArgAlaClyTyTLBuGluAsrClyLyaLeuIleLeuClnArgAsnlleGlyLeu 

2170  2180  2190  2200  2210  2220  2230  2240  ,  2250 

GAAATAAAOGATGTACAAATMTTAAGCMTCXX^AAAAAGAATATATTWtfXATrGATGOGaUWtfriACTIGCCA/W^GAiGTAAAATAGATACA 
GluIleLysAspValGlnllelleLysGlnSerGluLysGluTyrlleArglleAspAlaLysValValProLysSerLysIleAsp'nu' 

2260  2270  2280  2290  2300  2310  2320  2330  2340 

AAAATrCAAGAAGCACAGTTAAATATAAATCAGGAATGGAATAAAGCATrAGOGTrACCAAAATATACAAAGCmmCATTCAACGTG 
LysIleGlnGluAlaGlnLeuAsnlleAsnClnGluTrpAsxiLysAlaLeuGlyLeuProLysTyrThrLysLeuIlelhrPheAsnVal 
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2350  2360  2370  2380  2390  2400  2A10  2420  2430 

CATAATAGAXATCXIATCCAATATKnTtfy^AAffltXnTATrTAATATTGAATGMlXX;^^ 

HlsAsnAr^iyrAlaSerAsnlleValGluSerAlaTyrLeuIleLeuAsrCluTrpLysAsnAsnZleGlnSerAspLeulleLysLys 

2440  2450  2460  2470  2480  2490  2500  2510  2520 

ValThrAsnTyrLeuValAspGlyAsnGlyArgFheValFheThrAspIleThrLetiProAsnlleAlaGluGlnTyrlhrHisGlnAsp 

,  2530  2540  2550  2560  2570  2580  2590  2600  2610 

GAGAlATAlX3AGCAAGTrCATTCAAAAG0GTTAIATGTIXX»CAAICXXX?riCIAIAmCTCCAI^^ 
GluIleiyxGluGlnValHlsSerLysGlyLeuCryrValProGluSerArgSerlleLeuLeuHlsGlyProSerLysOlyValGluLeu 

2620  2630  2640  2650  2660  2670  2680  2690  2700 

AGGAAIGATAGrTCAOGGTTmtACAOGMTritXlACATGCnXTKXMXlATTAIGCTGGATATCTATIAGA^ 
Arg6^nAspSerGltjGlynieIleHlsGluRieGl^ilsAlaValAspAspTyrAlaGlyTyrLeuLeuAspLysAsnGlnSerAspLeu 

2710  2720  2730  2740  2750  2760  2770  2780  2790 

GrmCAMTrcnVW^AAMTrCATrGATATTTrTMGGAAGMGGGAGriMTmACTTOGrATGOGAGM 
ValThrAsnSerLysLysRielleAspIlenieLysGluCluClySerAsnLeuThrSeriyiGlyArg^IhrAsnGluAlaGluIliePhe 

2800  2810  2820  2830  2840  2850  2860  2870  2880 

GCAGAAGCCTITAGGrmATGCATIXnAOGGACCATGCTGMOCnTIi\AAAGTrQ\AAAAMTCOTCCGAAMC:rrTCCAAm 
AlaGluAlallieArgLeuHetHlsSetXhrAspHlsAlaCluArgLeuLysValGlnLysAsnAlaProLysThrllteGlnPhelleAsn 

2890  2900  2910  2920  2930  2940  2950  2960  2970 

GATCAGATl^^AGTrCATImAAC^CAIAACTMTGT4T^AAAAmTTCAAAT0GAmMTAATMT^^ 
AspGlnlleLysRiellelleAsnSer 

2980  2990  3000  3010  3020  3030  3040  3050  3060 

ACGAGCCATIAIGAAGCAACCAATrcTAGACTIGAIAGlTWaTCrriaXUWTCACCMGA^^ 


3070  3080  3090  3100  3110  3120  3130  3140  3150 

TlTIArcrrGTTCX?ITAGATATGAAOGCi\AAMCMTGATC{nX»C(nAG«WtfnriMTGAmT^ 

3160  3170  3180  3190  3200  3210  3220  3230  3240 

GGAATATEAGrrAAAAGTCCCGAAAWMXXTOTIOCMAGCTITia^AAGAACmmTTCIMCAAGT^^ 

3250  3260  3270  3280  3290 

TTCMTAMTITIGTV^TITWVCXa^TACXnrAAAAAACCGAAAT^^ 

SstI 
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APPENDIX  V.  LF  amino  aciii  sequence 


(29  aa  signal  peptide)  l*Start  of  mature  LF  (780  aa) 

1  MNIKKEFIKVZSMSCLVTAITLSGPVFIPLVQGA(^HGDVGMHVKEKEKNKOENKRKDEERNKTQEEHIi: 


71  EIMKHIVKIEVKGEEAVKKEAAEKLLEKVPSDVLEMYKAIGGKIYTVDGDITKHISLEALSEDKKKIKDI 


281  NEQEINLSLEELKDQRNLSRYEKVEKIKQHYQHVSDSLSEEGRGLLKKLQIPIEPKKDDIIHSLSQEEKE 
351  LLKRIQIDSSDFLSTEEKEFLKKLQIDIRDSLSEEEKELLNRIQVDSSNPLSEKEKEFLKKlJaDIQPYD 
421  INQRLQDTGGLIDSPSINLDVRKQYKRDIQNIDALLHQSIGSTLYNKIYLYENMNINNLTATLGADLVDS 
491  TDNTKINRGIFNEFKKNFKYSISSNYMIVDINERPALDNERLKWRIQLSPDTRAGYLENGKLILQRNIGL 
561  EIKDVQIIKQSEKEYIRIDAKWPKSKIDTKIQEAQLNINQEWNKALGLPKYTKLITFNVHNRYASNIVE 
631  SAYLILNEWKNNIQSDLIKKVTNYLVDGNGRFVFTDITLPNIAEQYTHQDEIYEQVHSKGLYVPESRSIL 
701  LHGPSKGVELRNDSEGFIHEFGHAVDDYAGYLLDKNQSDLVTNSKKFIDIFKEEGSNLTSYGRTNEAEFF 
771  AEAFRLMHSTDHAERLKVQJCNAPKTPQFINDQIKFIINS 


The  sequence  contains  809  amino  acids  (M^  93,798): 


Ala  (A) 

34 

Leu 

(L) 

80 

Arg  (R) 

27 

Lys 

(K) 

86 

Asn  (N) 

54 

Hat 

(M) 

10 

Asp  (D) 

55 

Phe 

(F) 

29 

Cys  (C) 

1 

Pro 

(P) 

21 

Gin  (Q) 

41 

Ser 

(S) 

54 

Glu  (E) 

79 

Thr 

(T) 

28 

Gly  (G) 

35 

Trp 

(W) 

5 

His  (H) 

21 

Tyr 

(Y) 

35 

He  (I) 

74 

Val 

(V) 

40 

Acidic 

(Asp  +  Glu) 

134 

Basic 

(Arg  +  Lys) 

113 

Aromatic 

(Phe  +  Trp  +  Tyr) 

69 

Hydrophobic 

(Aromatic  +  lie  + 

Leu  + 

Met 

+  Val) 

273 
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